I mean, they can be, but only under the heroic assumption that the groups are similar on average before any adjustment
Otherwise, we will have selection bias
Let’s see how this works in practice! 🤓
Khuzdar and Maria
Selection bias occurs when the groups are different even without treatment
This can happen for many reasons, but first let’s see the (simple) maths behind it
Using an example from Angrist and Pischke (2021), let’s say we have a student called Khuzdar from Kazakhstan, who is considering studying in the US and is worried about the cold weather
Should he get health insurance? 🤔
Let’s imagine that, without insurance, Khuzdar has a potential outcome of \(Y_{0,i} = 3\) and, with insurance, \(Y_{1,i} = 4\). So the treatment effect is \(\tau_i = 1\), that is, he gains 1 “health point” by getting insurance
Now, let’s imagine that Khuzdar has a Chilean colleague called Maria Moreno, who is also considering studying in the US
But since she comes from chilly Santiago, she is not worried about the cold weather
So, without insurance, Maria has a potential outcome of \(Y_{0,i} = 5\) and, with insurance, \(Y_{1,i} = 5\). So the treatment effect is \(\tau_i = 0\), that is, she gains no “health points” by getting insurance
Khuzdar and Maria
In fact, the comparison between frail Khuzdar and hearty Maria tells us little about the causal effects of their choices!
Why is that? Because they were different to begin with
Let’s do a little mathematical trick here: we will add and subtract \(Y_{0, Khuzdar}\) from the treatment effect (they cancel each other out, right?)
So we have the following:
What is the second term here?
Difference in means = average treatment effect + selection bias
The second term is the selection bias!
The same is true for averages: the difference in means is the average treatment effect plus the selection bias
Imagine that we have a dummy variable \(D_i\) for treatment, which takes the value 1 if the unit is treated (in our case, insured) and 0 otherwise
Thus:
So far, so good, right?
If we assume that the treatment has a constant effect (i.e. the treatment effect is the same for everyone), we can rewrite the equation as:
Where \(k\) is both the individual and average causal effect of insurance on health
Using the constant-effects model to substitute for \(Avg_n[Y_{1i}|D_i = 1]\), we have
How to check for selection bias?
Balance tests
We use balance tests or (randomisation checks) to check if the groups are similar before treatment
The idea is to compare the means of the covariates for the treated and untreated groups
If the means are similar, it provides evidence that nothing systematic is driving the treatment effect
We are never 100% sure, but we trust the random assignment
Small differences in means are acceptable, as long as they are not systematic
Why? Because some variation can happen only by chance
100% personal opinion: I think they are quite useless 😂